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Abstract 

We present a simple, efficient, and self-contained construction of a wait-free regular register 
from Byzantine storage components. Our construction utilizes a novel building block, called 
1-regular register, which can be implemented from Byzantine fault-prone components with the 
same round complexity as a safe register, and with only a slight increase in storage space. 
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1 Introduction 

In this paper we consider a problem of constructing wait-free distributed storage from Byzantine 
fault prone components in asynchronous settings. Our specific objective is to devise a solution that 
will be simple, efficient, and feasible in practice, yet providing meaningful semantics for higher level 
applications. Roughly speaking, a wait-free object is one that always guarantees the liveness of 
shared memory operations, including in the presence of any number of process (client) failures. 

Constructing efficient storage solutions from Byzantine components received considerable atten- 
tion recently, as such solutions are useful in a number of emerging application domains. Originally, 
such solutions have been introduced in the context of scalable client-server systems. Such systems 
achieve more scalability than traditional state-machine replication approaches by removing the di- 
rect communication among the servers, and thus reducing the load that each client request (or 
transaction) imposes on the servers. This approach was pioneered in the Fleet [17] system, and 
adopted in many others, e.g., SBQ-L [18], Agile Store [11], Coca [28], and [5]. Similar solutions have 
also been employed in the setting of Storage Area Networks (SANs). SAN technology allows clients 
to access disks directly over the network so that the file server bottleneck is eliminated. Examples 
of SAN-based systems that use disks for information sharing and coordination include Compaq's 
Petal [13] and Frangipani [23], Disk Paxos [7], Active Disk Paxos [6], and Byzantine Disk Paxos [1]. 
More recently, solutions of this nature have been adopted in peer-to-peer systems, which consist of 
a collection of widely spread nodes storing data objects. Naturally, due to their Internet-wide de- 
ployment, the storage nodes are prone to malicious attacks, which motivates adopting a Byzantine 
failure model for the storage nodes. Examples of peer-to-peer systems that adopt storage-centric 
replication to support availability in face of Byzantine failures include Rosebud [21] and [14]. 

Fault-prone storage systems as mentioned above can be formally modeled as an asynchronous 
shared memory system where a threshold t of the memory objects may fail by being non-responsive [3, 
10] or by returning arbitrary values [2, 10] (i.e., by being Byzantine); this failure model is called 
non-responsive arbitrary (NR- Arbitrary) faults [10]. In this paper, we assume that less than one 
fourth of the memory objects can fail. In [1] we show that this assumption is necessary for achieving 
wait-free implementations as efficient as those presented in this paper; that is, every construction 
that uses less than 4i + 1 objects must have a higher latency. 

All existing wait-free Byzantine-resilient storage constructions provide safe register seman- 
tics 1 [10, 16, 1]. The only previous direct constructions of objects with stronger (regular or atomic) 
semantics from Byzantine storage satisfy weaker (non-wait-free) termination conditions [18, 1]. In 
this paper we focus on wait-free constructions. Safe register semantics, by themselves, are too weak 
to be directly useful for applications. The focus on these semantics has been justified by the exis- 
tence of known reductions from wait-free safe registers to stronger ones [19, 27, 8, 12, 24, 25, 26, 9]. 
However, this approach results in constructions that are not self-contained, and as we argue below, 
are not tailored to the requirements of a distributed storage system. In this paper, we do not 
use these reductions as black boxes, but instead capitalize on their techniques in order to derive 
a new self-contained wait-free regular register construction that is simple, efficient, and feasible in 
distributed storage environments. 

Most existing constructions of strong memory objects are fairly elaborate. We believe that a 



A safe register guarantees that every read operation that does not overlap any write returns the latest written 
value, or the initial value if no value was written; the result of a read operation that does overlap a write operation 
may be arbitrary. 



major reason for this complexity is the fact that they aim to achieve an atomic register. However, 
recent studies indicate that storage with regular semantics is sufficient in most cases [7, 22, 1]. A 
regular register is weaker than an atomic one; roughly speaking, a regular register guarantees that 
every read operation returns the value that was written by a write operation invoked not earlier 
than the last write operation that returns before the read is invoked, or the initial value if no value 
is written before the read. We therefore focus on constructing regular registers in this paper. 

Existing constructions of strong (regular or atomic) wait-free objects from weaker ones were not 
designed with distributed storage in mind. In particular, such constructions have typically focused 
on bounding the memory size rather than reducing the number of shared memory accesses. In a 
distributed setting, however, every memory access incurs a latency of two message delays, whereas 
storage space is typically abundant. Therefore, we believe that a practical construction for the 
model we consider herein should focus on simplicity and reducing communication costs, even at 
the cost of using unbounded counters. This is precisely the approach we take in this paper. We 
give an algorithm that uses unbounded counters, (which is acceptable in practice), achieves better 
latency than all existing regular register implementations, and is very simple to understand and 
implement. 

Our approach further differs from existing constructions in the basic building blocks employed. 
Traditional shared memory constructions often use a collection of safe single-bit registers, which 
are mathematical models of flip-flops. In contrast, in a distributed storage setting, we can assume 
that each storage unit (representing a disk or a server) can support stronger objects. In this paper 
we introduce a novel intermediate building block, called 1-regular register. We show that a 1- 
regular register can be implemented from Byzantine fault-prone components with the same round 
complexity as a safe register, and with only a slight increase in storage space. We then give a simple 
and efficient implementation of a wait-free regular register using 1-regular ones. 

Outline: The rest of this paper is organized as follows: In Section 3, we describe the formal 
computation model used throughout the paper and give the definitions of various register types. 
In Section 4, we construct a wait-free 1-regular register from n > 4t base registers up to t of which 
can incur Byzantine faults. Finally, in Section 5, we show how to use 1-regular registers in order to 
construct a wait-free regular register. We note that all but one of the 1-regular registers employed in 
this construction can be replaced with (weaker) safe ones. Therefore, for completeness, we present 
a direct safe register construction in the appendix. As noted above, using safe registers instead of 
1-regular ones does not reduce the latency, but it slightly reduces the space requirements. 

2 Related Work 

Wait-free shared register constructions have been an actively researched area for several decades [19, 
27, 8, 12, 24, 25, 26, 9]. Most of the constructions found in the literature aim to implement 
atomic registers from safe bits. One notable exception is the construction by Lamport in [12], 
which implements an n-valued regular register using 0{n) safe bits. The construction of [27] was 
purported to be an atomic register construction, but in fact, only provides regular semantics. 

Peterson [19] and Tromp [24] present constructions of atomic registers from safe bit tracks 
and atomic control bits. It appears that these constructions can be easily adapted to implement 
regular registers by replacing the atomic control bits with regular ones, which in turn can be easily 
obtained from safe bits as shown in [12]; nevertheless, this was neither claimed nor proven in those 



papers. Although both of these constructions have logarithmic space complexity, the number of 
shared memory accesses they employ is rather high. Therefore, they are not directly applicable in 
a distributed setting. Nonetheless, our work benefits from several techniques and ideas underlying 
these constructions, e.g., using separate tracks to store copies of the register value, and using a 
handshake mechanism to coordinate between the reader and the writer. 

All the existing wait-free Byzantine-fault-tolerant register constructions for distributed storage 
setting only provide safe semantics [16, 10, 1]. Other constructions achieve stronger semantics 
at the cost of weaker termination guarantees: the protocols by [18] and [4] implement atomic and 
regular registers, respectively, but do not guarantee termination in the face of client crashes; and the 
algorithm of [1] implements a regular register where the read operation is guaranteed to terminate 
only if it eventually runs in isolation for sufficiently long. 

Our 1-regular register notion is a generalization of the pseudo-regular register of [20]. Whereas 
a read operation on our 1-regular register can return _L if it is concurrent with more than one write, 
a read from the pseudo-regular register is allowed to return _L if it is concurrent with any number 
of writes. Hence, the guarantees it provides corresponds to 0-regularity in our terminology 

3 The System Model 

We consider asynchronous shared memory systems consisting of a collection of processes interacting 
with a finite collection of objects. Objects and processes are modeled as I/O automata [15]; for 
space constraints, we do not repeat the details of the I/O model here. 

An object automaton's interface is determined by its type, which is a tuple consisting of the 
following components: (1) a set Vals of values; (2) a set of invocations; (3) a set of responses; and 
(4) a sequential specification, which is a function from invocations x Vals to responses x Vals. 
In a shared memory system consisting of processes P\,P2, ■ ■ ■ , an object of type T interacts with 
a process Pj by means of input actions of the form m, where a is an invocation of T, and output 
actions of the form bi, where b is a response of T. An object's external behavior is specified in 
terms of the properties of its traces (i.e., executions consisting of external actions only). Liveness 
properties are required to hold only in fair executions, i.e., executions where each output and 
internal action has infinitely many opportunities to occur. 

The interaction between a process and an object is well-formed if it consists of alternating 
invocations and responses, starting from an invocation. We only consider systems where interactions 
between processes and objects are well-formed. Well-formedness allows an invocation occurring in 
an execution a to be paired with a unique response (when such exist). If an invocation has a 
response in a, the invocation is complete; otherwise, it is incomplete. Note that well-formedness 
does not rule out concurrent operation invocations on the same object by different processes. Nor 
does it rule out parallel invocations by the same process on different objects, which can be performed 
in separate threads of control. 

A threshold t of the objects may suffer NR- Arbitrary failures [10], i.e., may fail to respond to 
an invocation, or may respond with an arbitrary value. Any number of the processes may fail by 
stopping. 



3.1 Registers 

A read/write register (or simply, register) type supports an arbitrary set Vals of values with an 
arbitrary initial value vq € Vals. Its invocations are read and write(v), v G Vals. Its responses 
are v £ Vals and ack. Its sequential specification, /, requires that every write overwrites the last 
value written and returns ack (i.e., / (write(v) , w) = (ack, w)); and every read returns the last value 
written (i.e., f(read,v) = (y,v)). In a shared memory system consisting of processes P\,P2, • • • , a 
process Pi interacts with a shared register by means of input actions of the form readi and write(v)i, 
and output actions of the form V{ and acki. A read/write register is called k-reader/m- writer if 
only k (m) processes are allowed to read (resp. write) the register. We use the term multi-reader 
when the particular number of readers is not important. 

We now define several register properties that will be used throughout the paper. Fix x to be 
a single-writer /multi-reader (SWMR) or single-writer/single-reader (SWSR) register, and let a be 
a sequence of invocations and responses of x. 

Safe register, a is safe [12] if every complete read operation that does not overlap any write 
operation returns the register's value when read was invoked (i.e., the latest written value or 
the initial value vq if no value was written). A register is called safe if it has only safe traces. 

Regular register, a is regular [12] if it is safe, and in addition, a read operation that does overlap 
some write operations returns either one of the values written by overlapping writes or the 
register's value before the first overlapping write is invoked. A register is regular if it has only 
regular traces. 

1-regular register, a is regular if it is safe, and in addition, a read operation that overlaps at most 
one write operation returns either the value written by overlapping write or the register's value 
before the overlapping write is invoked. Otherwise, a read operation may return in addition 
a special _l_ value. A register is 1-regular if it has only 1-regular traces. 

Wait Freedom. Register x is wait-free if in any fair execution of any shared memory system that 
includes x, every invocation of x by a correct process is complete. 

4 Wait-free 1-Regular Register Construction 

The implementation of a wait-free SWMR 1-regular register from n > At wait-free SWMR regular 
registers up to t of which can incur NR-arbitrary failures is depicted in Figure 1. The notation 
invoke write(xi,v), (respectively, invoke tmp <— read(xi,)) means that a new thread is started 
that performs a write on register Xj with value v (respectively, a read of register Xi whose response 
will be stored in local variable tmp). The notation xi responded means that the last thread 
created by an invoke operation has completed its execution on register X{. Note that this is well 
defined because we maintain well formedness using control flags pending and enabled in such a 
manner that there is at most one pending thread for each register. Each of the n base registers x\ 
consists of a cyclic two- value buffer whose elements are denoted Xi[Q] and Xi[l] respectively. Each 
write operation WRITE(u) first chooses a unique monotonically increasing timestamp ts and then 
writes the pair {ts, v) to the base registers X{ provided that the fcth write operation updates the k 
mod 2th part of Xj. This mechanism ensures that every read operation that is overlapped by at 



most one write operation will always be able to recover a good value (i.e., the value that upholds 
regularity) from the values read from either part of the registers as explained below. 

The read implementation reads the values from at least n—t base registers and stores the values 
read from the Xi[0] and Xi[l] components in the arrays wQ[l . . . n] and wl[l . . . n] respectively. The 
predicate safe(c, w), where c G TSVals (i.e., a tiniest amp- value pair) and wisa vector of TSvals, 
evaluates to true if c appears in at least £ + 1 elements of w. A value c which is safe in either wO or 
to 1 is known to be returned by at least one correct register Xi and therefore, was written by some 
previously invoked write. However, the safe predicate by itself is insufficient to ensure regularity 
since some old values could still be returned by t + 1 registers (e.g., if there are t registers which 
are not up to date and 1 which is faulty). Hence, in order for a safe value c to be returned, we 
must ensure that all values with a timestamp higher than c.ts are invalid, i.e., could not have been 
written by a later complete write operation. 

The validity check is accomplished by the predicate invalid(c, w), which returns true if and only 
if for at least 2£ + 1 elements w[j], w[j].ts < c.ts or w[j].ts = ts A w[j].val ^ c.val. Since every 
completely written value can only be overwritten by a higher timestamped value, a value c is invalid 
in wO (resp. w\) if and only if it was either not written to at least n — t components Xi[0] (resp. 
Xj[l]), or was not written at all. Hence, every value c which is both safe in wO (resp. w\) and 
for which all the higher timestamped values are invalid, is known to be not older than the value 
written by the last write operation that writes Xi[0] (resp. Xj[l]) and completes before read is 
invoked. Therefore, the highest timestamped such value is guaranteed to preserve regularity. 

The informal discussion above is formalized in the following two lemmas: 

Lemma 1. If read returns «/l, then v is either a value that was written by an overlapping 
write, or the register's value before the first overlapping WRITE was invoked. 

Proof. Let R be a READ operation that completes. Let v be the value written by the last write 
operation W that completes before R is invoked. By the write implementation, there exists a 
timestamp ts such that (ts,v) is written to either Xi[0] or Xi[l] component of at least n — t base 
registers Xj. Hence, w.l.o.g. we can assume that {ts, v) was written to either one of two components, 
say to Xi[0]. Since read returns a value / _L, by the read code, CO U Cl ^ 0. There are two cases 
to consider: 

First, suppose that CO = and CI / 0. By line 5, this implies that either (1) -isafe(c, wO) for 
all c G TSVals, or (2) for every c G TSVals such that safe(c, wO) holds, higherValid(c, wO) is also 
true. In both these cases, read overlaps at least one complete write operation that writes {ts' , v') 
with ts' > ts to the Xi[l] component of the base registers x%. Hence, there are at most It base 
registers X{ such that Xi[l].£s < ts or Xj[l].£s = ts A Xj[l].u ^ v. Thus, for any d G TSVals such 
that c' .ts < ts, higherValid(c', wl) is true. Therefore, for each cl G CI, cl was returned by at least 
one correct register (safe(cl)), and cl.ts > ts. Hence, cl.val was written by a write operation 
that follows W. Since Cl ^ 0, the return value val is equal to cl.val for some cl G Cl. Hence, the 
regularity is maintained. 

Finally, suppose that CO ^ 0. In this case, there are at most 2£ base registers X{ such that 
Xj[0].£s < ts or Xj[0].£s = ts A Xj[0].-i> / v. Thus, for any c' G TSVals such that d ' .ts < ts, 
higherValid(c', wO) is true. Since CO ^ 0, then for each cO G CO, cO was returned by at least one 
correct register (safe(cO)), and cO.ts > ts. Furthermore, by line 8, the value returned must have 
been written with a timestamp which is at least as high as cO.ts. Hence, the return value upholds 
regularity. □ 



Types: TSVals = TS x Vals, with selectors ts, vol; 

Shared objects: regular registers Xi € TSVals x TSVals, 1 < i < n; whose 

components are addressed by Xi[0] and Xj[l]; initially xi = ((tso,vo), (tso,vo)); 



WRITE Emulation 

Local variables: 

enabled[\ . . .n], pending[l . . . n] G Boolean 

initially Vz enabled[i] = pending[i] = false; 
w[2] G TSVals, initially w[0] = w[l] = (ts ,v ) 
turn G {0, 1}, initially 0; 
ts G TS; 



WRITE (v): 

choose ts G TS larger than previously used; 

w[turn] <— (ts, v); 

for 1 < i < n, enabled[i] <— true; 

repeat 

CHECK; 
until |{i : -ienabled[i] A -^pending[i]}\ > n — t; 
turn «— -^turn; 
return ack; 

CHECK: 

if (3i : enabled[i] A ->pending[i]) then 

(enabled[i],pending[i\) <— (false, true); 
invoke write(xi, (w[0],w[l})); 

if (3i : x t responded) then 
pending [i] <— false; 



READ Emulation 

Local variables: 

enabled[\ . . .n], pending[l . . .n], old[\ . . . n] G Boolean 

initially Vi enabled[i] = pending[i] = false; 
w0[l . . . n], wl[l . . . n], tmp0[l . . .n], tmp\[\ . . . n] G TSVals; 

Predicate and macro definitions: 

\nva\\d((ts,v),w) = \{i : w[i] = (ts',v')/\ 

(ts' < ts V (ts' =tsAv^ v'))}\ >2t+l; 
safe(c, w) — \{i : w[i] = c}| > t + 1; 
higherValid(c, w) = 3i : w[i] = d A d .ts > c.ts A ^invalid(c', w); 

read: 



for 1 < i < n, if (pending[i\) then old[i] <— true; 
for 1 < i < n, enabled[i] ^— true; 
for 1 < i < n: w0[i] «— _L, wl[z] <— _L; 
repeat 

CHECK; 
until \{i : ~^enabled[i] A ~^pending[i]}\ >n — t; 
CO <- {c0 G TS'V^a/s : safe(c0,u;0) A -nhigherValid(cO, w0)}; 
CI «- {cl G TSVals : safe(cl,iul) A -.higherValid(cl,iyl)}; 
if (CO U Cl ^ 0) then 

return c.val: c.ts = max{c'.ts : d G CO U Cl}; 
return _L; 



check: 

if (3i : enabled[i] A -pending [z]) then 

(enabled[i],pending[i\) «— (false, true); 
invoke (impO[i],tmpl[z]) <— read(xi); 
if (3j : x l responded) then 
if (-ioZd[i]) then 

(ui0[i], wl[i]) «— (impO[i],impl[i]); 
pending[i] <— false; old[i] <— false; 

Figure 1: The 1-regular register emulation. 

Lemma 2. If R = read returns _L, t/iere -R is concurrent with at least two write operations. 

Proof. Let W = write(u) be the last write operation that completes before R is invoked, and ts 
be the timestamp used to write v to the base registers. Suppose w.l.o.g. that (ts, v) was written 
to the Xi[0] component of the base registers x%. Assume by contradiction that read overlaps at 
most one write operation W' = write (V) and let ts' be the timestamp used to write v' to the 
base registers. Since the base registers updated by W and W intersect the base registers read by 
R by at least t + 1 correct registers, either safe((ts,v),wO) or saf : e((ts' ,v'),wl) (or both) are true. 
Moreover, there are at most 2t registers X{ such that rrj[0].ts < ts V (xj[0].£s = ts A Xi[Q].val / v) 
or Xj[l].ts < ts' V (xi[l].ts = ts' A Xi[l]x.val ^ v'). Hence, either -ihigherValid((£s,u),tt>0) or 
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-ih\gher\/a\\6((ts',v'),wl) is true. Therefore, CO U CI / so that _l_ cannot be returned. A 
contradiction. □ 

The two lemmas above imply the following 

Lemma 3 (1-Regularity). All the traces of the algorithm in Figure 1 are 1-regular. 

Finally, both WRITE and READ are wait-free because neither one ever awaits more than n — t 
replies and at least n — t base registers are responsive. We proved the following 

Theorem 1. The algorithm in Figure 1 is an implementation of a wait-free 1-regular register from 
n > At regular base registers at most t of which can incur NR- arbitrary faults. 

5 Wait-free Regular Register Construction 

We now present a regular register construction from 1-regular and safe ones. We note that only one 
of the registers in this construction needs to be 1-regular; it suffices for the remaining regulars to 
satisfy the weaker safe semantics. For completeness, we show in Appendix A a direct construction 
of a wait-free safe register from n > 4t regular base registers up to t of which can be Byzantine 
faulty, using a technique similar to that of [16]. 

The algorithm in Figure 2 uses one wait-free l-writer/m-reader 1-regular registers, m 1 -reader /l- 
writer safe registers writeable by the writer, and m 1-reader/l-writer safe registers writeable by 
the reader to construct a wait-free 1-writer/m-reader regular register. 

The read and write operations to the shared safe and 1-regular registers are denoted read and 
write. We construct write and read operations that maintain regularity. 

The two writer's registers are called P and B for primary and backup respectively. These 
registers are used as buffers (or tracks) to store the values written by the writer. Each reader 
process i, 1 < i < m uses a 1-writer/l-reader multi-valued safe register RLi to signal the writer 
about a possible concurrent read in progress. 

The construction works as follows: The writer starts by writing the primary register P (line 
1). It then examines the reader registers RLi to see whether the value of some of these registers 
has changed since the last time it was read by the writer. If so, it saves the last written value 
in the backup registers Bj for every register RLj where a change was observed (lines 4-6) before 
returning ack (line 7). 

The reader i starts by writing the register RLi (line 2) and then proceeds to read the primary 
track P (line 3). We consider the following two cases: 

1. The read of the primary track by the reader (line 3) returns a value 7^ _l_. In this case, 
1 -regularity of P ensures that the return value preserves regularity; 

2. The read of the primary track by the reader (line 3) returns _L. In this case, 1-regularity 
implies that read(P) by i is concurrent with at least two writes to P by the writer. Therefore, 
the writer's code is executed in whole at least once since read(P) has been invoked. This 
implies that the writer observes the change in the value of the register RLi, and therefore, 
writes the i's backup track exactly one time before read(P) completes. Therefore, when reader 
i eventually returns from read(P), it finds Bi already written, and therefore, returns a correct 
value. 



Shared objects: 

1-writer/l-reader safe registers RLi £ Integers, writeable by the reader i, 
1 < i < m, and readable by the writer; initially 0; 

1-writer/m-reader 1-regular register P S Vals U _L writeable by the writer 
and readable by the readers; initially vq; 

1-writer/l-reader safe register Bi G Vals writeable by the writer and read- 
able by the reader i; initially vq; 



Local to the writer: 

Static wl[\ . . . m] € Integers 

initially \/i wl[i] = 0; 
aj S Integers, for 1 < i < m; 



Local to the reader i, 1 < i < m: 
Static rli & Integers, initially 0; 
x £ ValsU{±}; 



mi 


^e(v): 




READ 


i, 1 < i < m: 


1: 


write(P,v); 




1: 


rli <- rk + 1; 


2: 


a>i <— read(RLi), for each i, 


1 < i < m; 


2: 


write(RLi,rli); 


3: 


if (3i : a>i ^ wl[i]) then 




3: 


x <— read(P); 


4: 


for each i, ai ^ wl[i\: 




4: 


if (x = _L) then 


5: 


wl[i] <— af, 




5: 


x <— read(Bi); 


6: 


write(Bi,v); 




6: 


return x; 


7: 


return ack; 









Figure 2: The 1-writer/m-reader Wait-Free Regular Register Construction. 



We observe that all the writes to the backup registers (line 6) can be executed in parallel. 
Hence in a distributed implementation, updating all B^s would incur only the single round-trip 
message latency. This could be optimized even further by combining the writes targeted to the 
same destinations within a single message. 



Pi 



Ri.write(RLi) 



/-- 



Ri.read{P) 



Ri.read(Bi) 



Wi 



W 2 i 



W* 



Wi.write(P) 

W2-read(RLi 

Figure 3: Several write operations overlapping read. 

The above intuition is formalized by the following lemma: 

Lemma 4. Let read overlap one or more write operations. Then READ returns one of the values 
written in an overlapping WRITE, or the value written in the latest WRITE operation preceding the 
READ. 



Proof. (We refer the reader to Figure 3 for intuition on this proof.) 



Let Ri be a read operation by process i. Let Ri consist of a write Ri.write(RLi) to RLf, a 
read Ri.read(P) of P; and potentially a read Ri.read(Bi) of -Bj. 

For any write operation W, we refer to specific operations within write as follows. We denote 
by W.write(P) the first write operation (to P). We denote by W.read(RLi) the read from RLi, 
W.a>i the value returned from W.read(RLi), and W.wl the value of wl at the beginning of W. We 
let W.write(Bi) denote the write to B>i, if exists. 

Let Wi denote the latest write such that Wi.write(P) terminates before Ri.Write(RLi) com- 
pletes. Let the two write operations succeeding Wi be denoted Wi and W3, respectively. 

By choice of Wi.write(P), Wi.read(RLi) strictly succeeds Ri.write(RLi) (see Figure 2). Hence, 
if no write operation before Wi sees the value written in Ri.write(RLi), then Wi must have 
Wi.wl / Wi.di, and Wi.write(Bi) exists. Otherwise, some write preceding Wi already sees 
Ri.write(RLi) and writes Bi. In any case, at the latest when Wi completes, a write to Bi completes. 
Moreover, Bi is written at most once during Ri.write(RLi). The value stored in Bi in this write is 
the most up-to-date value written in a write preceding or concurrent with Ri. 

Now there are two possible cases. The first case is when Ri.read(P) returns a value other than 
_L. Then by 1-regularity, it returns a good value from P (i.e., it returns a non-_L value of an 
overlapping write or the latest write preceding the read). 

The second case is when Ri.read(P) returns _L. Then by 1-regularity, Ri.read(P) overlaps 
two write operations. Since Wi strictly precedes Ri.read(P), we have that Wi and W3 overlap 
Ri.read(P). Hence, Wi completes before Ri.read(P) terminates. As we show above, at the latest, 
Wi is a write that sees Ri.write(RLi) and updates Bi. Since Ri.write(RLi) completes before 
Ri.read(Bi) starts, and because no new write to Bi is invoked until Ri.read(Bi) returns, by safety 
of Bi, Ri.read(Bi) returns a non-_L value that upholds regularity. □ 

Finally, if a read operation Ri by a process i is not concurrent with any write, then by 1- 
regularity of P, if there exists a write operation preceding Ri, then Ri returns the value written 
to P by the latest such write. Otherwise, Ri returns vq, the initial value of the register. Also, 
the algorithm is obviously wait-free since in a fair execution, it could never block forever in any 
statement of the pseudocode. Hence, we proved the following: 

Theorem 2. The algorithm in Figure 2 is an implementation of a 1-writer/m-reader wait-free 
regular register from one 1-writer/m-reader wait-free 1-regular registers and 2m 1 -writer /l -reader 
safe registers. 

6 Conclusions 

We have presented a simple, efficient, and self-contained construction of a wait-free regular reg- 
ister from Byzantine components. This yields a practical building block for distributed storage 
applications tolerating Byzantine faults. 
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A Wait-Free Safe Register Construction 

In this section we present a wait-free SWMR safe register construction from n > 4t regular registers 
up to t of which can be Byzantine faulty (see Figure 4) . The implementation is based on techniques 
similar to those of [16]. 

We now prove that the algorithm in Figure 4 is a safe register construction: 

Lemma 5 (Safety). All traces of the algorithm in Figure 4 are safe. 

Proof. Let R be a read operation that returns, and W = write(v ) be a write operation that 
returns before R is invoked, and assume that for no W operation that follows W, W is invoked 
before R returns. We prove that R returns v , thereby upholding safety. Indeed, by the write 
code, write((ts,v)) terminates on at least n — t > 3t + 1 base registers before R is invoked. Since 
read awaits responses from at least n — t registers, by regularity of Xi, there are at least t + 1 
correct registers that respond with {ts, v) to the base object reads issued during R. Therefore, 
upon completion of line 4, safe((ts,v)) is true. Thus, (ts,v) G C after line 5. Finally, since no 
write operation that follows W is invoked before R returns, there might be at most t (faulty) base 
registers that return (ts', v') with ts' > ts or ts' = ts A v' 7^ v. Hence, (ts, v) is the only highest 
timestamped value in C which is safe. By line 8, v must be the -R's return value in this case. □ 

Finally, the construction is trivially wait-free since neither write nor read are ever awaiting 
more than n — t responses and at least n — t registers are correct. Hence, we proved the following: 

Theorem 3. The algorithm in Figure 4 is an implementation of a SWMR safe register from n > 4t 
base regular register up to t of which can be Byzantine faulty. 
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Shared objects 

SWMR regular registers xi G TSVals, 1 < i < n; initially Xi = (ts 0l vq); 



WRITE Emulation 

Local variables: 

enabled[l . ..n], pending]! . . .n] G Boolean 

initially Vi enabled[i] — pending[i] — false; 
w G TSVals; ts G TS; 



WRITE (v): 

choose ts G TS larger than previously used; 

w «- (ts,v); 

for 1 < i < n, enabled[i] <— true; 

repeat 

CHECK; 
until \{i : ~^enabled[i] A ~^pending[i]}\ > n — t; 
return ack; 

CHECK: 

if (3i : enabled[i] A ->pending[i]) then 
(enabled[i],pending[i\) <— (false, true); 
INVOKE write(xi,w); 

if (3i : x t responded) then 

pending[i] <— false; 



READ Emulation 

Local variables: 

enabled[\ ...n\, pending[l ...n], old[\ . . . n] G Boolean 

initially Vz enaMe<i[i] = pending[i] = false; 
w[l . . . n], tmp[\ . . . n] G TSVals; 

Predicate and macro definitions: 
safe(c) = \{i :w[i\ = c}\ > t+l; 



READ: 

1: 

2: 
3: 
4: 



for 1 < i < n, if (pending[i\) then old[i] <— true; 
for 1 < «' < n, enabled[i] «— irue; 
for 1 < « < n: w[i] <— _L; 
repeat 

CHECK; 
until \{i : ~^enabled[i] A ^peWmg[i]}| > n — t; 
C <- {c G TSVaZs : safe(c, w)}; 
if (C ^ 0) then 

return c.val: c.ts — max{c' .ts : c' G C}; 
return vq; 



check: 

if (3i : enabled[i] A spending [i]) then 
(e?iaMed[i],pendmg[i]) <— (false, true); 
INVOKE imp[i] <— read(xi); 
if (3i : Xi responded) then 
if (^old[i\) then 
w[i] <— imp[i]; 
pending[i] <— false; old[i] <— false; 



Figure 4: The SWMR safe register emulation. 
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